Skip to content

Commit b5e9420

Browse files
authored
[INLONG-1814] Show document file subdirectories and change the document directory level (#190)
1 parent a279898 commit b5e9420

35 files changed

+260
-260
lines changed

docs/modules/agent/architecture.md

+6-6
Original file line numberDiff line numberDiff line change
@@ -2,27 +2,27 @@
22
title: Architecture
33
---
44

5-
## 1. Overview of InLong-Agent
5+
## 1 Overview of InLong-Agent
66
InLong-Agent is a collection tool that supports multiple types of data sources, and is committed to achieving stable and efficient data collection functions between multiple heterogeneous data sources including file, sql, Binlog, metrics, etc.
77

8-
### The brief architecture diagram is as follows:
8+
### 1.1 The brief architecture diagram is as follows:
99
![](img/architecture.png)
1010

11-
### design concept
11+
### 1.2 design concept
1212
In order to solve the problem of data source diversity, InLong-agent abstracts multiple data sources into a unified source concept, and abstracts sinks to write data. When you need to access a new data source, you only need to configure the format and reading parameters of the data source to achieve efficient reading.
1313

14-
### Current status of use
14+
### 1.3 Current status of use
1515
InLong-Agent is widely used within the Tencent Group, undertaking most of the data collection business, and the amount of online data reaches tens of billions.
1616

17-
## 2. InLong-Agent architecture
17+
## 2 InLong-Agent architecture
1818
The InLong Agent task is used as a data acquisition framework, constructed with a channel + plug-in architecture. Read and write the data source into a reader/writer plug-in, and then into the entire framework.
1919

2020
+ Reader: Reader is the data collection module, responsible for collecting data from the data source and sending the data to the channel.
2121
+ Writer: Writer is a data writing module, which reuses data continuously to the channel and writes the data to the destination.
2222
+ Channel: The channel used to connect the reader and writer, and as the data transmission channel of the connection, which realizes the function of data reading and monitoring
2323

2424

25-
## 3. Different kinds of agent
25+
## 3 Different kinds of agent
2626
### 3.1 file agent
2727
File collection includes the following functions:
2828

docs/modules/agent/quick_start.md

+9-9
Original file line numberDiff line numberDiff line change
@@ -2,15 +2,15 @@
22
title: Build && Deployment
33
---
44

5-
## 1Configuration
5+
## 1 Configuration
66
```
77
cd inlong-agent
88
```
99

1010
The agent supports two modes of operation: local operation and online operation
1111

1212

13-
### Agent configuration
13+
### 1.1 Agent configuration
1414

1515
Online operation needs to pull the configuration from inlong-manager, the configuration conf/agent.properties is as follows:
1616
```ini
@@ -20,25 +20,25 @@ agent.manager.vip.http.host=manager web host
2020
agent.manager.vip.http.port=manager web port
2121
```
2222

23-
## 2run
23+
## 2 run
2424
After decompression, run the following command
2525

2626
```bash
2727
sh agent.sh start
2828
```
2929

3030

31-
## 3Add job configuration in real time
31+
## 3 Add job configuration in real time
3232

33-
#### 3.1 agent.properties Modify the following two places
33+
### 3.1 agent.properties Modify the following two places
3434
```ini
3535
# whether enable http service
3636
agent.http.enable=true
3737
# http default port
3838
agent.http.port=Available ports
3939
```
4040

41-
#### 3.2 Execute the following command
41+
### 3.2 Execute the following command
4242
```bash
4343
curl --location --request POST 'http://localhost:8008/config/job' \
4444
--header 'Content-Type: application/json' \
@@ -78,7 +78,7 @@ agent.http.port=Available ports
7878
- proxy.streamId: The streamId type used when writing proxy, streamId is the data flow id showed on data flow window in inlong-manager
7979

8080

81-
## 4eg for directory config
81+
## 4 eg for directory config
8282

8383
E.g:
8484
/data/inlong-agent/test.log //Represents reading the new file test.log in the inlong-agent folder
@@ -87,7 +87,7 @@ agent.http.port=Available ports
8787
/data/inlong-agent/^\\d+(\\.\\d+)? // Start with one or more digits, followed by. or end with one. or more digits (? stands for optional, can match Examples: "5", "1.5" and "2.21"
8888

8989

90-
## 5. Support to get data time from file name
90+
## 5 Support to get data time from file name
9191

9292
Agent supports obtaining the time from the file name as the production time of the data. The configuration instructions are as follows:
9393
/data/inlong-agent/***YYYYMMDDHH***
@@ -143,7 +143,7 @@ curl --location --request POST'http://localhost:8008/config/job' \
143143
}'
144144
```
145145

146-
## 6. Support time offset reading
146+
## 6 Support time offset reading
147147

148148
After the configuration is read by time, if you want to read data at other times than the current time, you can configure the time offset to complete
149149
Configure the job attribute name as job.timeOffset, the value is number + time dimension, time dimension includes day and hour

docs/modules/dataproxy-sdk/architecture.md

+8-8
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,16 @@
11
---
22
title: Architecture
33
---
4-
# 1、intro
4+
## 1 intro
55
When the business uses the message access method, the business generally only needs to format the data in a proxy-recognizable format (such as six-segment protocol, digital protocol, etc.)
66
After group packet transmission, data can be connected to inlong. But in order to ensure data reliability, load balancing, and dynamic update of the proxy list and other security features
77
The user program needs to consider more and ultimately leads to the program being too cumbersome and bloated.
88

99
The original intention of API design is to simplify user access and assume some reliability-related logic. After the user integrates the API in the service delivery program, the data can be sent to the proxy without worrying about the grouping format, load balancing and other logic.
1010

11-
# 2、functions
11+
## 2 functions
1212

13-
## 2.1 overall functions
13+
### 2.1 overall functions
1414

1515
| function | description |
1616
| ---- | ---- |
@@ -22,17 +22,17 @@ The original intention of API design is to simplify user access and assume some
2222
| proxy list persistence (new)| Persist the proxy list according to the business group id to prevent the configuration center from failing to send data when the program starts
2323

2424

25-
## 2.2 Data transmission function description
25+
### 2.2 Data transmission function description
2626

27-
### Synchronous batch function
27+
#### Synchronous batch function
2828

2929
public SendResult sendMessage(List<byte[]> bodyList, String groupId, String streamId, long dt, long timeout, TimeUnit timeUnit)
3030

3131
Parameter Description
3232

3333
bodyListIt is a collection of multiple pieces of data that users need to send. The total length is recommended to be less than 512k. groupId represents the service id, and streamId represents the interface id. dt represents the time stamp of the data, accurate to the millisecond level. It can also be set to 0 directly, and the api will get the current time as its timestamp in the background. timeout & timeUnit: These two parameters are used to set the timeout time for sending data, and it is generally recommended to set it to 20s.
3434

35-
### Synchronize a single function
35+
#### Synchronize a single function
3636

3737
public SendResult sendMessage(byte[] body, String groupId, String streamId, long dt, long timeout, TimeUnit timeUnit)
3838

@@ -41,7 +41,7 @@ The original intention of API design is to simplify user access and assume some
4141
body is the content of a single piece of data that the user wants to send, and the meaning of the remaining parameters is basically the same as the batch sending interface.
4242

4343

44-
### Asynchronous batch function
44+
#### Asynchronous batch function
4545

4646
public void asyncSendMessage(SendMessageCallback callback, List<byte[]> bodyList, String groupId, String streamId, long dt, long timeout,TimeUnit timeUnit)
4747

@@ -50,7 +50,7 @@ The original intention of API design is to simplify user access and assume some
5050
SendMessageCallback is a callback for processing messages. The bodyList is a collection of multiple pieces of data that users need to send. The total length of multiple pieces of data is recommended to be less than 512k. groupId is the service id, and streamId is the interface id. dt represents the time stamp of the data, accurate to the millisecond level. It can also be set to 0 directly, and the api will get the current time as its timestamp in the background. timeout and timeUnit are the timeout time for sending data, generally recommended to be set to 20s.
5151

5252

53-
### Asynchronous single function
53+
#### Asynchronous single function
5454

5555

5656
public void asyncSendMessage(SendMessageCallback callback, byte[] body, String groupId, String streamId, long dt, long timeout, TimeUnit timeUnit)

docs/modules/dataproxy/architecture.md

+4-4
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,22 @@
11
---
22
title: Architecture
33
---
4-
# 1、intro
4+
## 1 intro
55

66
Inlong-dataProxy belongs to the inlong proxy layer and is used for data collection, reception and forwarding. Through format conversion, the data is converted into TDMsg1 format that can be cached and processed by the cache layer
77
InLong-dataProxy acts as a bridge from the InLong collection end to the InLong buffer end. Dataproxy pulls the relationship between the business group id and the corresponding topic name from the manager module, and internally manages the producers of multiple topics
88
The overall architecture of inlong-dataproxy is based on Apache Flume. On the basis of this project, inlong-bus expands the source layer and sink layer, and optimizes disaster tolerance forwarding, which improves the stability of the system.
99

1010

11-
# 2、architecture
11+
## 2 architecture
1212

1313
![](img/architecture.png)
1414

1515
1. The source layer opens port monitoring, which is realized through netty server. The decoded data is sent to the channel layer
1616
2. The channel layer has a selector, which is used to choose which type of channel to go. If the memory is eventually full, the data will be processed.
1717
3. The data of the channel layer will be forwarded through the sink layer. The main purpose here is to convert the data to the TDMsg1 format and push it to the cache layer (tube is more commonly used here)
1818

19-
# 3、DataProxy support configuration instructions
19+
## 3 DataProxy support configuration instructions
2020

2121
DataProxy supports configurable source-channel-sink, and the configuration method is the same as the configuration file structure of flume:
2222

@@ -158,7 +158,7 @@ agent1.sinks.meta-sink-more1.max-survived-size = 3000000
158158
Maximum number of caches
159159
```
160160
161-
# 4、Monitor metrics configuration instructions
161+
## 4 Monitor metrics configuration instructions
162162
163163
DataProxy provide monitor indicator based on JMX, user can implement the code that read the metrics and report to user-defined monitor system.
164164
Source-module and Sink-module can add monitor metric class that is the subclass of org.apache.inlong.commons.config.metrics.MetricItemSet, and register it to MBeanServer. User-defined plugin can get module metric with JMX, and report metric data to different monitor system.

docs/modules/dataproxy/quick_start.md

+7-7
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11
---
22
title: Build && Deployment
33
---
4-
## Deploy DataProxy
4+
## 1 Deploy DataProxy
55

66
All deploying files at `inlong-dataproxy` directory.
77

8-
### config TubeMQ master
8+
### 1.1 config TubeMQ master
99

1010
`tubemq_master_list` is the rpc address of TubeMQ Master.
1111
```
@@ -14,33 +14,33 @@ $ sed -i 's/TUBE_LIST/tubemq_master_list/g' conf/flume.conf
1414

1515
notice that conf/flume.conf FLUME_HOME is proxy the directory for proxy inner data
1616

17-
### Environmental preparation
17+
### 1.2 Environmental preparation
1818

1919
```
2020
sh prepare_env.sh
2121
```
2222

23-
### config manager web url
23+
### 1.3 config manager web url
2424

2525
configuration file: `conf/common.properties`:
2626
```
2727
# manager web
2828
manager_hosts=ip:port
2929
```
3030

31-
## run
31+
## 2 run
3232

3333
```
3434
sh bin/start.sh
3535
```
3636

3737

38-
## check
38+
## 3 check
3939
```
4040
telnet 127.0.0.1 46801
4141
```
4242

43-
## Add DataProxy configuration to InLong-Manager
43+
## 4 Add DataProxy configuration to InLong-Manager
4444

4545
After installing the DataProxy, you need to insert the IP and port of the DataProxy service is located into the backend database of InLong-Manager.
4646

docs/modules/manager/architecture.md

+5-5
Original file line numberDiff line numberDiff line change
@@ -2,19 +2,19 @@
22
title: Architecture
33
---
44

5-
## Introduction to Apache InLong Manager
5+
## 1 Introduction to Apache InLong Manager
66

77
+ Target positioning: Apache inlong is positioned as a one-stop data access solution, providing complete coverage of big data access scenarios from data collection, transmission, sorting, and technical capabilities.
88

99
+ Platform value: Users can complete task configuration, management, and indicator monitoring through the platform's built-in management and configuration platform. At the same time, the platform provides SPI extension points in the main links of the process to implement custom logic as needed. Ensure stable and efficient functions while lowering the threshold for platform use.
1010

1111
+ Apache InLong Manager is the user-oriented unified UI of the entire data access platform. After the user logs in, it will provide different function permissions and data permissions according to the corresponding role. The page provides maintenance portals for the platform's basic clusters (such as mq, sorting), and you can view basic maintenance information and capacity planning adjustments at any time. At the same time, business users can complete the creation, modification and maintenance of data access tasks, and index viewing and reconciliation functions. The corresponding background service will interact with the underlying modules when users create and start tasks, and deliver the tasks that each module needs to perform in a reasonable way. Play the role of coordinating the execution process of the serial back-end business.
12-
## Architecture
12+
## 2 Architecture
1313

1414
![](img/inlong-manager.png)
1515

1616

17-
##Module division of labor
17+
## 3 Module division of labor
1818

1919
| Module | Responsibilities |
2020
| :----| :---- |
@@ -24,9 +24,9 @@ title: Architecture
2424
| manager-web | Front-end interactive response interface |
2525
| manager-workflow-engine | Workflow Engine |
2626

27-
## use process
27+
## 4 use process
2828
![](img/interactive.jpg)
2929

3030

31-
## data model
31+
## 5 data model
3232
![](img/datamodel.jpg)

docs/modules/manager/quick_start.md

+6-6
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
title: Build && Deployment
33
---
44

5-
# 1. Environmental preparation
5+
## 1 Environmental preparation
66
- Install and start MySQL 5.7+, copy the `doc/sql/apache_inlong_manager.sql` file in the inlong-manager module to the
77
server where the MySQL database is located (for example, copy to `/data/` directory), load this file through the
88
following command to complete the initialization of the table structure and basic data:
@@ -25,15 +25,15 @@ title: Build && Deployment
2525
to [Compile and deploy TubeMQ Manager](https://inlong.apache.org/zh-cn/docs/modules/tubemq/tubemq-manager/quick_start.html)
2626
, install and start TubeManager.
2727

28-
# 2. Deploy and start manager-web
28+
## 2 Deploy and start manager-web
2929

3030
**manager-web is a background service that interacts with the front-end page.**
3131

32-
## 2.1 Prepare installation files
32+
### 2.1 Prepare installation files
3333

3434
All installation files at `inlong-manager-web` directory.
3535

36-
## 2.2 Modify configuration
36+
### 2.2 Modify configuration
3737

3838
Go to the decompressed `inlong-manager-web` directory and modify the `conf/application.properties` file:
3939

@@ -74,7 +74,7 @@ The dev configuration is specified above, then modify the `conf/application-dev.
7474
sort.appName=inlong_app
7575
```
7676

77-
## 2.3 Start the service
77+
### 2.3 Start the service
7878

7979
Enter the decompressed directory, execute `sh bin/startup.sh` to start the service, and check the
8080
log `tailf log/manager-web.log`. If a log similar to the following appears, the service has started successfully:
@@ -83,7 +83,7 @@ log `tailf log/manager-web.log`. If a log similar to the following appears, the
8383
Started InLongWebApplication in 6.795 seconds (JVM running for 7.565)
8484
```
8585

86-
# 3. Service access verification
86+
## 3 Service access verification
8787

8888
Verify the manager-web service:
8989

docs/modules/sort/introduction.md

+9-9
Original file line numberDiff line numberDiff line change
@@ -7,31 +7,31 @@ Inlong-sort is used to extract data from different source systems, then transfor
77
Inlong-sort is simply an Flink application, and relys on Inlong-manager to manage meta data(such as the source informations and storage informations)
88

99
# features
10-
## multi-tenancy
10+
## 1 multi-tenancy
1111
Inlong-sort is an multi-tenancy system, which means you can extract data from different sources(these sources must be of the same source type) and load data into different sinks(these sinks must be of the same storage type).
1212
e.g. you can extract data form different topics of inlong-tubemq and the load them to different hive clusters.
1313

14-
## change meta data without restart
14+
## 2 change meta data without restart
1515
Inlong-sort uses zookeeper to manage its meta data, every time you change meta data on zk, inlong-sort application will be informed immediately.
1616
e.g if you want to change the schema of your data, just change the meta data on zk without restart your inlong-sort application.
1717

18-
# supported sources
18+
## 3 supported sources
1919
- inlong-tubemq
2020
- pulsar
2121

22-
# supported storages
22+
## 4 supported storages
2323
- clickhouse
2424
- hive (Currently we just support parquet file format)
2525

26-
# limitations
26+
## 5 limitations
2727
Currently, we just support extracting specified fields in the stage of **Transform**.
2828

29-
# future plans
30-
## More kinds of source systems
29+
## 6 future plans
30+
### 6.1 More kinds of source systems
3131
kafka and etc
3232

33-
## More kinds of storage systems
33+
### 6.2 More kinds of storage systems
3434
Hbase, Elastic Search, and etc
3535

36-
## More kinds of file format in hive sink
36+
### 6.3 More kinds of file format in hive sink
3737
sequence file, orc

docs/modules/sort/protocol_introduction.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ Currently the metadata management of inlong-sort relies on inlong-manager.
77

88
Metadata interaction between inlong-sort and inlong-manager is performed via ZK.
99

10-
# Zookeeper's path structure
10+
## 1 Zookeeper's path structure
1111

1212
![img.png](img.png)
1313

@@ -20,6 +20,6 @@ A path at the top of the figure indicates which dataflow are running in a cluste
2020

2121
The path below is used to store the details of the dataflow.
2222

23-
# Protocol
23+
## 2 Protocol
2424
Please reference
2525
`org.apache.inlong.sort.protocol.DataFlowInfo`

0 commit comments

Comments
 (0)